Problem Statement

Installing Dependencies

Loading Dataset

Shape of Data

No. of rows (datapoints) - 5,50,068 || No. of columns (features) - 10

Datatype of features

No. of categorical feature - 5 || No. of continuous (numerical) feature - 5

Unique Value (counts) for each features

Insights

Missing Values Detection

There is no missing values found in the dataset

Validating duplicate values

There is no duplicated values found in the dataset

Statistical Summary

Non-Graphical Analysis

Value counts and Unique Attributes

Graphical Analysis

Univariate Countinous Variable Analysis

Insights

Detecting Outliers

Handling Outliers

Univariate Categorical Variable Analysis

Insights

Insights

Insights

Insights

Insights

Insights

Bivariate Analysis

Insights

Lets draw analysis for Gender

Insights

Insights

Insights

Lets draw analysis for City Category Among Different Age Brackets

Insights

Correlation Analysis

Insights

4. Answering questions

4.1 Are women spending more money per transaction than men? Why or Why not?

Insights

Lets take sample of data to verify this fact

Insights

4.2 Confidence intervals and distribution of the mean of the expenses by female and male customers

Insights

Calculating CI (90%, 95%, 99%) using Bootstrapping for Puchases based on Gender using CLT

Calculation 90% CI

Insights

Calculate 95% CI

Insights

Calculate 99% CI

Insights

4.3 Are confidence intervals of average male and female spending overlapping? How can Walmart leverage this conclusion to make changes or improvements?

Answer

Conclusion to make changes and improvements

4.4 Results when the same activity is performed for Married vs Unmarried

Insights

Calculating CI (90%, 95%, 99%) using Bootstrapping for Puchases based on Marital_Status using CLT

Insights

Calculating 90% CI

Insights

Calculating 95% CI

Insights

Calculating 99% CI

Insights

Major Inferences

4.5 Results when the same activity is performed for Age

Calculate CI (90%, 95%, 99%) for purchase based on age using CLT

Calculating 90% CI

Checking the Sampling distribution of a sample mean for each Age Group for 90% CI

Calculating 95% CI

Checking the Sampling distribution of a sample mean for each Age Group for 95% CI

Calculating 99% CI

Checking the Sampling distribution of a sample mean for each Age Group for 99% CI

Major Inferences

5. Final Insights

Based on EDA

Based on CLT & CI

6. Recommendations